Skip to content

test/e2e: make collectPodLogs log dump best-effort#6315

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
mboersma:fix-collectpodlogs-best-effort
May 18, 2026
Merged

test/e2e: make collectPodLogs log dump best-effort#6315
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
mboersma:fix-collectpodlogs-best-effort

Conversation

@mboersma
Copy link
Copy Markdown
Contributor

@mboersma mboersma commented May 17, 2026

What type of PR is this?

/kind flake

What this PR does / why we need it:

Mirrors the fix from #6265 (commit 4a8e9f1) for the sibling collectPodLogs helper. collectPodLogs runs from [AfterEach] to dump pod descriptions and container logs for the workload cluster, and currently uses Expect(...).To(Succeed()) when listing pods. That turns any transient unreachability of the workload cluster API server into a hard spec failure during teardown.

After #6265, collectNodes returns gracefully on i/o timeout, but collectPodLogs is the very next call in the AfterEach log-collection path and hits the same transient *.cloudapp.azure.com:6443: i/o timeout against the same load balancer. The result is that otherwise-successful runs of the apiversion-upgrade-v1beta1 job fail in [AfterEach] at test/e2e/azure_clusterproxy.go:112.

Recent example: 5 of 6 consecutive failures on the cherry-pick PR #6314 show STEP: PASSED! followed by [FAILED] in [AfterEach] - .../azure_clusterproxy.go:112, with the same i/o timeout signature against the management cluster LB in canadacentral.

Match the pattern already used a few lines below for streaming container logs (and now used in collectNodes): log the error and continue instead of failing the spec.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

This should be cherry-picked to release-1.24 so it can unblock #6314.

/cherry-pick release-1.24

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

test/e2e: make collectPodLogs log dump best-effort

The collectPodLogs helper runs from [AfterEach] to dump pod descriptions
and container logs for the workload cluster. It currently uses
Expect(...).To(Succeed()) when listing pods, which turns any transient
inability to reach the workload cluster API server into a hard spec
failure during teardown.

This is the same bug pattern that was fixed for the sibling collectNodes
helper in 4a8e9f1 (kubernetes-sigs#6265). With collectNodes now returning gracefully
on i/o timeout, collectPodLogs is the next call in the AfterEach log
collection path and hits the same transient unreachability against
*.cloudapp.azure.com:6443, causing otherwise-successful runs of the
apiversion-upgrade job to fail in [AfterEach].

Match the pattern already used a few lines below for streaming container
logs (and now used in collectNodes): log the error and continue instead
of failing the spec.

Signed-off-by: Matt Boersma <Matt.Boersma@microsoft.com>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/flake Categorizes issue or PR as related to a flaky test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 17, 2026
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 17, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.84%. Comparing base (e8b9ce3) to head (9ab0c7a).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6315   +/-   ##
=======================================
  Coverage   43.84%   43.84%           
=======================================
  Files         289      289           
  Lines       25346    25346           
=======================================
  Hits        11114    11114           
  Misses      13458    13458           
  Partials      774      774           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mboersma mboersma requested a review from willie-yao May 18, 2026 16:34
@mboersma
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-1.23

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: once the present PR merges, I will cherry-pick it on top of release-1.23 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mboersma
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-1.24

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: once the present PR merges, I will cherry-pick it on top of release-1.24 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.24

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 18, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 255f03e310c28406dfb2b4f5ab11bb137bbbef6a

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: willie-yao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2026
@k8s-ci-robot k8s-ci-robot merged commit 9857540 into kubernetes-sigs:main May 18, 2026
27 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 18, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in CAPZ Planning May 18, 2026
@mboersma mboersma deleted the fix-collectpodlogs-best-effort branch May 18, 2026 16:56
@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: new pull request created: #6317

Details

In response to this:

/cherry-pick release-1.24

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@mboersma: new pull request created: #6318

Details

In response to this:

/cherry-pick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants